253 research outputs found

    The Distributional Learning of Multi-Word Expressions: A Computational Approach

    Get PDF
    There has been much recent research in corpus and computational linguistics on distributional learning algorithms—computer code that induces latent linguistic structures in corpus data based on co-occurrences of transcribed units in that data. These algorithms have varied applications, from the investigation of human cognitive processes to the corpus extraction of relevant linguistic structures for lexicographic, second language learning, or natural language processing applications, among others. They also operate at various levels of linguistic structure, from phonetics to syntax. One area of research on distributional learning algorithms in which there remains relatively little work is the learning of multi-word, memorized, formulaic sequences, based on the co-occurrences of words. Examples of such multi-word expressions (MWEs) include kick the bucket, New York City, sit down, and as a matter of fact. In this dissertation, I present a novel computational approach to the distributional learning of such sequences in corpora. Entitled MERGE (Multi-word Expressions from the Recursive Grouping of Elements), my algorithm iteratively works by (1) assigning a statistical ‘attraction’ score to each two-word sequence (bigram) in a corpus, based on the individual and co-occurrence frequencies of these two words in that corpus; and (2) merging the highest-scoring bigram into a single, lexicalized unit. These two steps then repeat until some maximum number of iterations or minimum score threshold is reached (since, broadly speaking, the winning score progressively decreases with increasing iterations). Because one (or both) of the ‘words’ making up a winning bigram may be an output merged item from a previous iteration, the algorithm is able to learn MWEs that are in principle of any length (e.g., apple pie versus I’ll believe it when I see it). Moreover, these MWEs may contain one or more discontinuities of different sizes, up to some maximum size threshold (measured in words) specified by the user (e.g., as _ as in as tall as and as big as). Typically, the extraction of MWEs has been handled by algorithms that identify only continuous sequences, and in which the user must specify the length(s) of the sequences to be extracted beforehand; thus, MERGE offers a bottom-up, distributional-based approach that addresses these issues.In the present dissertation, in addition to describing the algorithm, I report three rating experiments and one corpus-based early child language study that validate the efficacy of MERGE in identifying MWEs. In one experiment, participants rate sequences extracted from a corpus by the algorithm for how well they instantiate true MWEs. As expected, the results reveal that the high-scoring output items that MERGE identifies early in its iterative process are rated as ‘good’ MWEs by participants (based on certain subjective criteria), with the quality of these ratings decreasing for output from later iterations (i.e., output items that were scored lower by the algorithm). In the other two experiments, participants rate high-ranking output both from MERGE and from an existing algorithm from the literature that also learns MWEs of various lengths—the Adjusted Frequency List (Brook O’Donnell 2011). Comparison of participant ratings reveals that the items that MERGE acquires are rated more highly than those acquired by the Adjusted Frequency List, suggesting that MERGE is a performance frontrunner among distributional learning algorithms of MWEs. More broadly, together the experiments suggest that MERGE acquires representations that are compatible with adult knowledge of formulaic language, and thus it may be useful for any number of research applications that rely on such formulaic language as a unit of analysis.Finally, in a study using two corpora of caregiver-child interactions, I run MERGE on caregiver utterances and then show that, of the MWEs induced by the algorithm, those that go on to be later acquired by the children receive higher scores by the algorithm than those that do not go on to be learned. These results suggest that, when applied to acquisition data, the algorithm is useful for identifying the structures of statistical co-occurrences in the caregiver input that are relevant to children in their acquisition of early multi-word knowledge.Overall, MERGE is shown to be a powerful computational approach to the distributional learning and extraction of MWEs, both when modeling adult knowledge of formulaic language, and when accounting for the early multi-word structures acquired by children

    The Long-Baseline Neutrino Experiment: Exploring Fundamental Symmetries of the Universe

    Get PDF
    The preponderance of matter over antimatter in the early Universe, the dynamics of the supernova bursts that produced the heavy elements necessary for life and whether protons eventually decay --- these mysteries at the forefront of particle physics and astrophysics are key to understanding the early evolution of our Universe, its current state and its eventual fate. The Long-Baseline Neutrino Experiment (LBNE) represents an extensively developed plan for a world-class experiment dedicated to addressing these questions. LBNE is conceived around three central components: (1) a new, high-intensity neutrino source generated from a megawatt-class proton accelerator at Fermi National Accelerator Laboratory, (2) a near neutrino detector just downstream of the source, and (3) a massive liquid argon time-projection chamber deployed as a far detector deep underground at the Sanford Underground Research Facility. This facility, located at the site of the former Homestake Mine in Lead, South Dakota, is approximately 1,300 km from the neutrino source at Fermilab -- a distance (baseline) that delivers optimal sensitivity to neutrino charge-parity symmetry violation and mass ordering effects. This ambitious yet cost-effective design incorporates scalability and flexibility and can accommodate a variety of upgrades and contributions. With its exceptional combination of experimental configuration, technical capabilities, and potential for transformative discoveries, LBNE promises to be a vital facility for the field of particle physics worldwide, providing physicists from around the globe with opportunities to collaborate in a twenty to thirty year program of exciting science. In this document we provide a comprehensive overview of LBNE's scientific objectives, its place in the landscape of neutrino physics worldwide, the technologies it will incorporate and the capabilities it will possess.Comment: Major update of previous version. This is the reference document for LBNE science program and current status. Chapters 1, 3, and 9 provide a comprehensive overview of LBNE's scientific objectives, its place in the landscape of neutrino physics worldwide, the technologies it will incorporate and the capabilities it will possess. 288 pages, 116 figure

    LSST: from Science Drivers to Reference Design and Anticipated Data Products

    Get PDF
    (Abridged) We describe here the most ambitious survey currently planned in the optical, the Large Synoptic Survey Telescope (LSST). A vast array of science will be enabled by a single wide-deep-fast sky survey, and LSST will have unique survey capability in the faint time domain. The LSST design is driven by four main science themes: probing dark energy and dark matter, taking an inventory of the Solar System, exploring the transient optical sky, and mapping the Milky Way. LSST will be a wide-field ground-based system sited at Cerro Pach\'{o}n in northern Chile. The telescope will have an 8.4 m (6.5 m effective) primary mirror, a 9.6 deg2^2 field of view, and a 3.2 Gigapixel camera. The standard observing sequence will consist of pairs of 15-second exposures in a given field, with two such visits in each pointing in a given night. With these repeats, the LSST system is capable of imaging about 10,000 square degrees of sky in a single filter in three nights. The typical 5σ\sigma point-source depth in a single visit in rr will be 24.5\sim 24.5 (AB). The project is in the construction phase and will begin regular survey operations by 2022. The survey area will be contained within 30,000 deg2^2 with δ<+34.5\delta<+34.5^\circ, and will be imaged multiple times in six bands, ugrizyugrizy, covering the wavelength range 320--1050 nm. About 90\% of the observing time will be devoted to a deep-wide-fast survey mode which will uniformly observe a 18,000 deg2^2 region about 800 times (summed over all six bands) during the anticipated 10 years of operations, and yield a coadded map to r27.5r\sim27.5. The remaining 10\% of the observing time will be allocated to projects such as a Very Deep and Fast time domain survey. The goal is to make LSST data products, including a relational database of about 32 trillion observations of 40 billion objects, available to the public and scientists around the world.Comment: 57 pages, 32 color figures, version with high-resolution figures available from https://www.lsst.org/overvie

    Filovirus RefSeq Entries: Evaluation and Selection of Filovirus Type Variants, Type Sequences, and Names

    Get PDF
    Sequence determination of complete or coding-complete genomes of viruses is becoming common practice for supporting the work of epidemiologists, ecologists, virologists, and taxonomists. Sequencing duration and costs are rapidly decreasing, sequencing hardware is under modification for use by non-experts, and software is constantly being improved to simplify sequence data management and analysis. Thus, analysis of virus disease outbreaks on the molecular level is now feasible, including characterization of the evolution of individual virus populations in single patients over time. The increasing accumulation of sequencing data creates a management problem for the curators of commonly used sequence databases and an entry retrieval problem for end users. Therefore, utilizing the data to their fullest potential will require setting nomenclature and annotation standards for virus isolates and associated genomic sequences. The National Center for Biotechnology Information’s (NCBI’s) RefSeq is a non-redundant, curated database for reference (or type) nucleotide sequence records that supplies source data to numerous other databases. Building on recently proposed templates for filovirus variant naming [ ()////-], we report consensus decisions from a majority of past and currently active filovirus experts on the eight filovirus type variants and isolates to be represented in RefSeq, their final designations, and their associated sequences

    Phosphoinositide-3 Kinase-Akt Pathway Controls Cellular Entry of Ebola Virus

    Get PDF
    The phosphoinositide-3 kinase (PI3K) pathway regulates diverse cellular activities related to cell growth, migration, survival, and vesicular trafficking. It is known that Ebola virus requires endocytosis to establish an infection. However, the cellular signals that mediate this uptake were unknown for Ebola virus as well as many other viruses. Here, the involvement of PI3K in Ebola virus entry was studied. A novel and critical role of the PI3K signaling pathway was demonstrated in cell entry of Zaire Ebola virus (ZEBOV). Inhibitors of PI3K and Akt significantly reduced infection by ZEBOV at an early step during the replication cycle. Furthermore, phosphorylation of Akt-1 was induced shortly after exposure of cells to radiation-inactivated ZEBOV, indicating that the virus actively induces the PI3K pathway and that replication was not required for this induction. Subsequent use of pseudotyped Ebola virus and/or Ebola virus-like particles, in a novel virus entry assay, provided evidence that activity of PI3K/Akt is required at the virus entry step. Class 1A PI3Ks appear to play a predominant role in regulating ZEBOV entry, and Rac1 is a key downstream effector in this regulatory cascade. Confocal imaging of fluorescently labeled ZEBOV indicated that inhibition of PI3K, Akt, or Rac1 disrupted normal uptake of virus particles into cells and resulted in aberrant accumulation of virus into a cytosolic compartment that was non-permissive for membrane fusion. We conclude that PI3K-mediated signaling plays an important role in regulating vesicular trafficking of ZEBOV necessary for cell entry. Disruption of this signaling leads to inappropriate trafficking within the cell and a block in steps leading to membrane fusion. These findings extend our current understanding of Ebola virus entry mechanism and may help in devising useful new strategies for treatment of Ebola virus infection

    Taxonomy of the order Mononegavirales : update 2016

    Get PDF
    In 2016, the order Mononegavirales was emended through the addition of two new families (Mymonaviridae and Sunviridae), the elevation of the paramyxoviral subfamily Pneumovirinae to family status (Pneumoviridae), the addition of five free-floating genera (Anphevirus, Arlivirus, Chengtivirus, Crustavirus, and Wastrivirus), and several other changes at the genus and species levels. This article presents the updated taxonomy of the order Mononegavirales as now accepted by the International Committee on Taxonomy of Viruses (ICTV)

    Non-invasive cardiac assessment in high risk patients (The GROUND study): rationale, objectives and design of a multi-center randomized controlled clinical trial

    Get PDF
    Background: Peripheral arterial disease (PAD) is a common disease associated with a considerably increased risk of future cardiovascular events and most of these patients will die from coronary artery disease (CAD). Screening for silent CAD has become an option with recent non-invasive developments in CT (computed tomography)-angiography and MR (magnetic resonance) stress testing. Screening in combination with more aggressive treatment may improve prognosis. Therefore we propose to study whether a cardiac imaging algorithm, using non-invasive imaging techniques followed by treatment will reduce the risk of cardiovascular disease in PAD patients free from cardiac symptoms. Design: The GROUND study is designed as a prospective, multi-center, randomized clinical trial. Patients with peripheral arterial disease, but without symptomatic cardiac disease will be asked to participate. All patients receive a proper risk factor management before randomization. Half of the recruited patients will enter the 'control group' and only undergo CT calcium scoring. The other half of the recruited patients (index group) will undergo the non invasive cardiac imaging algorithm followed by evidence-based treatment. First, patients are submitted to CT calcium scoring and CT angiography. Patients with a left main (or equivalent) coronary artery stenosis of > 50% on CT will be referred to a cardiologist without further imaging. All other patients in this group will undergo dobutamine stress magnetic resonance (DSMR) testing. Patients with a DSMR positive for ischemia will also be referred to a cardiologist. These patients are candidates for conventional coronary angiography and cardiac interventions (coronary artery bypass grafting (CABG) or percutaneous cardiac interventions (PCI)), if indicated. All participants of the trial will enter a 5 year follow up period for the occurrence of cardiovascular events. Sequential interim analysis will take place. Based on sample size calculations about 1200 patients are needed to detect a 24% reduction in primary outcome. Implications: The GROUND study will provide insight into the question whether non-invasive cardiac imaging reduces the risk of cardiovascular events in patients with peripheral arterial disease, but without symptoms of coronary artery disease. Trial registration: Clinicaltrials.gov NCT0018911

    Virus nomenclature below the species level : a standardized nomenclature for laboratory animal-adapted strains and variants of viruses assigned to the family Filoviridae

    Get PDF
    The International Committee on Taxonomy of Viruses (ICTV) organizes the classification of viruses into taxa, but is not responsible for the nomenclature for taxa members. International experts groups, such as the ICTV Study Groups, recommend the classification and naming of viruses and their strains, variants, and isolates. The ICTV Filoviridae Study Group has recently introduced an updated classification and nomenclature for filoviruses. Subsequently, and together with numerous other filovirus experts, a consistent nomenclature for their natural genetic variants and isolates was developed that aims at simplifying the retrieval of sequence data from electronic databases. This is a first important step toward a viral genome annotation standard as sought by the US National Center for Biotechnology Information (NCBI). Here, this work is extended to include filoviruses obtained in the laboratory by artificial selection through passage in laboratory hosts. The previously developed template for natural filovirus genetic variant naming ( //<year of sampling>/-) is retained, but it is proposed to adapt the type of information added to each field for laboratory animal-adapted variants. For instance, the full-length designation of an Ebola virus Mayinga variant adapted at the State Research Center for Virology and Biotechnology “Vector” to cause disease in guinea pigs after seven passages would be akin to “Ebola virus VECTOR/C.porcellus-lab/COD/1976/Mayinga- GPA-P7”. As was proposed for the names of natural filovirus variants, we suggest using the fulllength designation in databases, as well as in the method section of publications. Shortened designations (such as “EBOV VECTOR/C.por/COD/76/May-GPA-P7”) and abbreviations (such as “EBOV/May-GPA-P7”) could be used in the remainder of the text depending on how critical it is to convey information contained in the full-length name. “EBOV” would suffice if only one EBOV strain/variant/isolate is addressed.This work was funded in part by the Joint Science and Technology Office for Chem Bio Defense (proposal #TMTI0048_09_RD_T to SB).http://www.springerlink.com/content/0304-8608/hb2013ab201

    Virus nomenclature below the species level : a standardized nomenclature for filovirus strains and variants rescued from cDNA

    Get PDF
    Specific alterations (mutations, deletions, insertions) of virus genomes are crucial for the functional characterization of their regulatory elements and their expression products, as well as a prerequisite for the creation of attenuated viruses that could serve as vaccine candidates. Virus genome tailoring can be performed either by using traditionally cloned genomes as starting materials, followed by site-directed mutagenesis, or by de novo synthesis of modified virus genomes or parts thereof. A systematic nomenclature for such recombinant viruses is necessary to set them apart from wild-type and laboratoryadapted viruses, and to improve communication and collaborations among researchers who may want to use recombinant viruses or create novel viruses based on them. A large group of filovirus experts has recently proposed nomenclatures for natural and laboratory animal-adapted filoviruses that aim to simplify the retrieval of sequence data from electronic databases. Here, this work is extended to include nomenclature for filoviruses obtained in the laboratory via reverse genetics systems. The previously developed template for natural filovirus genetic variant naming,\virus name[(\strain[/)\isolation host-suffix[/ \country of sampling[/\year of sampling[/\genetic variant designation[-\isolate designation[, is retained, but we propose to adapt the type of information added to each field for cDNA clone-derived filoviruses. For instance, the full-length designation of an Ebola virus Kikwit variant rescued from a plasmid developed at the US Centers for Disease Control and Prevention could be akin to ‘‘Ebola virus H.sapiens-rec/COD/1995/Kikwit-abc1’’ (with the suffix ‘‘rec’’ identifying the recombinant nature of the virus and ‘‘abc1’’ being a placeholder for any meaningful isolate designator). Such a full-length designation should be used in databases and the methods section of publications. Shortened designations (such as ‘‘EBOV H.sap/COD/95/ Kik-abc1’’) and abbreviations (such as ‘‘EBOV/Kik-abc1’’) could be used in the remainder of the text, depending on how critical it is to convey information contained in the full-length name. ‘‘EBOV’’ would suffice if only one EBOV strain/variant/isolate is addressed.http://link.springer.com/journal/705hb201

    Sex differences in cerebral venous sinus thrombosis after adenoviral vaccination against COVID-19

    Get PDF
    Introduction: Cerebral venous sinus thrombosis associated with vaccine-induced immune thrombotic thrombocytopenia (CVST-VITT) is a severe disease with high mortality. There are few data on sex differences in CVST-VITT. The aim of our study was to investigate the differences in presentation, treatment, clinical course, complications, and outcome of CVST-VITT between women and men. Patients and methods: We used data from an ongoing international registry on CVST-VITT. VITT was diagnosed according to the Pavord criteria. We compared the characteristics of CVST-VITT in women and men. Results: Of 133 patients with possible, probable, or definite CVST-VITT, 102 (77%) were women. Women were slightly younger [median age 42 (IQR 28–54) vs 45 (28–56)], presented more often with coma (26% vs 10%) and had a lower platelet count at presentation [median (IQR) 50x109/L (28–79) vs 68 (30–125)] than men. The nadir platelet count was lower in women [median (IQR) 34 (19–62) vs 53 (20–92)]. More women received endovascular treatment than men (15% vs 6%). Rates of treatment with intravenous immunoglobulins were similar (63% vs 66%), as were new venous thromboembolic events (14% vs 14%) and major bleeding complications (30% vs 20%). Rates of good functional outcome (modified Rankin Scale 0-2, 42% vs 45%) and in-hospital death (39% vs 41%) did not differ. Discussion and conclusions: Three quarters of CVST-VITT patients in this study were women. Women were more severely affected at presentation, but clinical course and outcome did not differ between women and men. VITT-specific treatments were overall similar, but more women received endovascular treatment.</p
    corecore